ADAPTIVE DENSITY ESTIMATION FOR GENERAL ARCH MODELS 



F. COMTE*'S J. DEDECKER^ AND M. L. TAUPIN ^ 

Abstract. We consider a model Yt = atrit in which (at) is not independent of the noise 
process (r?*); but cr* is independent of rjt for each t. We assume that (ut) is stationary and 
we propose an adaptive estimator of the density of ln(cr|) based on the observations Yt. 
Under various dependence structures, the rates of this nonparametric estimator coincide 
with the minimax rates obtained in the i.i.d. case when (at) and (ryt) are independent, 
in all cases where these minimax rates are known. The results apply to various linear 
and non linear ARCH processes. 
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1. Introduction 

In this paper, we consider the following general ARCH-type model: {{Yt, at))t>o is a 
strictly stationary sequence of M x M+-valued random variables, satisfying the equation 

(1.1) Yt = atvt 

where {r]t)tez is a sequence of independent and identically distributed (i.i.d.) random 
variables with mean zero and finite variance, and for each t > 0, the random vector 
(cTj, r/i_i)o<i<t is independent of the sequence {r]i)i>f 

The model is classically re-written via a logarithmic transformation: 

(1.2) Zt = Xt + et, 

where Zt = ln{Y^), Xt = ln((Tj ) and et = ln{r]t). In the context derived from the model 
Hl.ip . Xt and Et are independent for a given t, whereas the processes {Xt)t>o and (et)tez 
are not independent. 

Our aim is the adaptive estimation of g, the common distribution of the unobserved 
variables Xt = ln((Tj), when the density of et = ln(ry|) is known. More precisely we 
shall build an estimator of g without any prior knowledge on its smoothness, using the 
observations Zt = ln(y^^)t and the knowledge of the convolution kernel f^. Since Xt and £t 
are independent for each t, the common density fz of the Zt's is given by the convolution 
equation fz = 9* fe- 

In many papers dealing with ARCH models, et is assumed to be Gaussian or the log of 
a squared Gaussian (when rjt is Gaussian, see van Es et al. H2()()5|l or in slightly different 
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contexts van Es et al. ISnnni), Comte and Genon-Catalot ^^). Our setting IS more 
general since we consider various type of error densities. More precisely, we assume that 
fe belongs to some class of smooth functions described below: there exist nonnegative 
numbers kq, 7, /i, and 5 such that the fourier transform /* of fe satisfies 

(1.3) Ko(x2 + l)-T/2exp{-/i|x|^} < \ f*{x)\ <K'o(x2 + l)-T/2exp{-/i|x|^}. 

Since fe is known, the constants h,6,ko, and 7 defined in (jl.3|l are known. When 6 = 
in Ijl.Sp . the errors are called "ordinary smooth" errors. When > and 6 > 0, they are 
called "super smooth". The standard examples for super smooth densities are Gaussian or 
Cauchy distributions (super smooth of order 7 = 0, 5 = 2 and 7 = 0, 5 = 1 respectively). 
When £t = ln(?7f) with r]t ~ AA(0, 1) as in van Es et al. (|2nn31 12nn5p . then £t is super- 
smooth with (5 = 1,7 = and /i = 7r/2. An example of ordinary smooth density is the 
Laplace distribution, for which 6 = fj, = and 7 = 2. 

In density deconvolution of i.i.d variables the X^'s and the et's are i.i.d. and the sequences 
{Xf)t>Q and (ef)tez are independent (for short we shall refer to this case as the i.i.d. 
case). In the setting of Model (|1.2p . the classical assumptions of independence between the 
processes {Xt)t>o and (et)tez are no longer satisfied and the tools for deconvolution have 
to be revisited. 

As in density deconvolution for i.i.d. variables, the slowest rates of convergence for 
estimating g are obtained for super smooth error densities. For instance, in the i.i.d case, 
when Et is Gaussian or the log of a squared Gaussian and g belongs to some Sobolev class, 
the minimax rates are negative powers of ln(n) (see Fan H199H1 1. Nevertheless, it has been 
noticed by several authors (see Pensky and Vidakovic (tW9"9Tl ■ Butucea |2 OMi . Butucea 
and Tsybakov H2()()5|l . Comte et al. (|2()()6p ) that the rates are improved if g has stronger 
smoothness properties. So, we describe the smoothness properties of g by the set 

(1.4) 5,,r,fe(C7i) = {V' such that / \i;*{x)\'^{x^ + ly exp{2b\x\''}dx < Ci\ 

for s,r,b unknown non negative numbers. When r = 0, the class Ss^r,b{Ci) corresponds to 
a Sobolev ball. When r > 0,6 > functions belonging to Ss^r,b{Ci) are infinitely many 
times differentiable. 

Our estimator of g is constructed by minimizing an appropriate penalized contrast func- 
tion only depending on the observations and on /g. It is chosen in a purely data-driven way 
among a collection of non-adaptive estimators. We start by the study of those non-adaptive 
estimators and show that their mean integrated squared error (MISE) has the same order 
as in the i.i.d. case. In particular they reach the minimax rates of the i.i.d. case in all 
cases where they are known (see Fan (|199ip . Butucea (|l2nn4p and Butucea and Tsybakov 
I|2flfl5p ). Next we prove that the MISE of our adaptive estimator is of the same order as 
the MISE of the best non-adaptive estimator, up to some possible negligible logarithmic 
loss in one case. 

In their 2005 paper, van Es et al. H2005p have considered the case where r]t is Gaussian, 
the density g of Xt is twice differentiable, and the process {Zt,Xt) is a-mixing. Here 
we consider various types of error density, and we do not make any assumption on the 
smoothness of g: this is the advantage of the adaptive procedure. We shall consider 
two types of dependence properties, which are satisfied by many ARCH processes. First 
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we shall use the classical /3-mixing properties of general ARCH models, as recalled in 
Doukhan H1994|l and described in more details in Carrasco and Chen (|2nn2p . But we 
also illustrate that new recent coefficients can be used in our context, which allow an 
easy characterization of the dependence properties in function of the parameters of the 
models. Those new dependence coefficients, recently defined and studied in Dedecker and 
Prieur I|2nn5|l . are interesting and powerful because they require much lighter conditions 
on the models. Such ideas have been popularized by Ango Nze and Doukhan H2nn4|l and 
Doukhan et al. H2nn6|l . For instance, these coefficients allow to deal with the general 
ARCH(oo) processes defined by Giraitis et al. (|2nnnp . 

The paper is organized as follows. Many examples are described in Section [21 together 
with their dependence properties. The estimator is defined in Section|Sl The MISE bounds 
are given in Section^ and the proofs are given in Section 5. 

2. The model and its dependence properties 
2.1. Models and examples. A particular case of model is 

(2.1) Yt = atrjt, with at = f{vt-i,Vt-2, ■ ■ •) 
for some measurable function /. Another important case is 

(2.2) Yt = atijt, with at = f{at^i,-r]t-i) and ctq independent of (r/t)t>o, 

that is at is a stationary Markov chain. 

We begin with models satisfying a recursive equation, whose stationary solution satisfies 
(|2.H) . The original ARCH model as introduced by Engle l|1982|l was given by 

(2.3) Yt = ^a + bY^l^r]t, a > 0, 6 > 

It has been generalized by Bollerslev l|1986|l with the class of GARCH(p, q) models defined 

by Yt = at-qt and 

(2.4) <y^ = a + j2 arYl, + ^ h,at^ 

i=i j=i 

where the coefficients a,ai,i = 1, . . . ,p and bj,j = 1, . . . ,q are all positive real numbers. 
Those processes were studied from the point of view of existence and stationarity of solu- 
tions by Bougerol and Picard H19y2a,[ Il992bp and Ango Nze I|iy92|l . Under the condition 
Yl^=i '^i + X]j=i < 1' this model has a unique stationary solution of the form H2.H1 . 

Many extensions have been proposed since then. A general linear example of model is 
given by the ARCH(oo) model described by Giraitis et al. H2()()()|l : 

oo 

(2.5) a| = a + ^a,y/_,., 

i=i 

where a > and aj > 0. Again if ^jyiCij < 1, then there exists a unique strictly 
stationary solution to (|2.5p of the form (|2.ip . 

For the models satisfying H2.2p . let us cite first the so-called augmented GARCH(1, 1) 
models introduced by Duan {1997 ): 

(2.6) Aia^) = c(r?t_i)A(aii) + h{r]t-i), 



4 



F. COMTE*'\ J. DEDECKER2, AND M. L. TAUPIN ^ 



where A is an increasing and continuous function on R+. We refer to Duan H1997|l for nu- 
merous examples of more standard models belonging to this class. There exists a stationary 
solution to (|2.6jl . provided c satisfies the condition A2 given in Carrasco and Chen H2nn2jl 
(this condition is satisfied as soon as E(|c(r/o)|*) < 1 and E(|/i(?7o)|*) < 00 for integer s > 1, 
see the condition A2 of the same paper). An example of the model (|2.6|) is the threshold 
ARCH model (see Zakoian lITflflHll ^: 

(2.7) (Jt = a + bat-irit-il{nt_^yo} - ccrt_i?7f_iI{^^_^<o}, a,b,c>0 

for which c(r/t_i) = 6?7(_iI|^^_^>o} ~ crit-ii{rjt^i<o} ^ind h = a. In particular, the condition 
for the stationarity is satisfied as soon as 6 V c < 1. 

Other models satisfying H2.2p are the non linear ARCH models (see Doukhan (|1994p . p. 
106-107), for which: 

(2.8) at = f{at-ii]t-i). 

There exists a stationary solution to H2.8p provided that the density of % is positive on a 
neighborhood of and limsup|2,|^oo \f{x)/x\ < 1. 

In the next section, we define the dependence coefficients that we shall use in this 
paper, and we give the dependence properties of the models H2.Hp - H2.8p in terms of these 
coefficients. 

2.2. Measures of dependence. Let (0,^,P) be a probability space. Let be a random 
vector with values in a Banach space (B, || • ||b), and let be a cr-algebra of A. Let IPvy|Al 
be a conditional distribution of W given M., and let Pw be the distribution of W. Let 
I3{M) be the Borel cr-algebra on (B, || • ||b), and let Ai(B) be the set of 1-Lipschitz functions 
from (B, II • ||b) to M. Define now 

p{M,a{W)) = e( sup \Fwim{A)-Fw{A)\), 

andifE(||Ty||B) <oo, t{M,W) = e( sup |Pty|x(/) - Pw(/)|) . 

VeAi(B) ' 

The coefficient /3(A^, cr(V7)) is the usual mixing coefficient, introduced by Rozanov and 
Volkonskii (TlMill . The coefficient t{M,W) has been introduced by Dedecker and Prieur 
J200R11. 

Let {Wt)t>Q be a strictly stationary sequence of M^-valued random variables. On M^, we 
put the norm ||x — y\^i = \xi — yi\ + \x2 — y2\- For any k > 0, define the coefficients 

(2.9) pi{k)=p{aiWo),a{Wk)), and if E(|| WqIIr^) < 00, n{k) = T{a{Wo),Wk). 

On (M^)', we put the norm ||x — y||(iR2y = /"-"-(Hxi — yi\\^2 + • • • + ||x/ — 2//||k2). Let 
Aii = a{Wk,0 <k <i). The coefficients (3oo{k) and T^{k) are defined by 

(2.10) (3oo{k) = su-psuv {|3{Mi,a{W^^, . . . ,Wii)),i + k < n < • •• < i/}, 

i>0 1>1 

and if E(||Tyi||ig2) < cx), 

(2.11) T^{k) =supsup{T(7Wi,(Wii,...,PyiJ),i + A; < h < ■ ■ ■ < k} . 

i>0 1>1 
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We say that the process (Wt)t>o is /3-mixing (resp. r-dependent) if the coefficients 
I3oo{k) (resp. Too (A;)) tend to zero as k tends to infinity. We say that it is geometrically 
/3-mixing (resp. r-dependent), if there exist a > 1 and C > such that Poo{k) < Ca^ 
(resp. Tooik) < Ca'^) for all k>l. 

We now recall the coupling properties associated with the dependency coefficients. As- 
sume that 0, is rich enough, which means that there exists U uniformly distributed over 
[0, 1] and independent ofM.ya{W). There exist two A1Vcj(C/)VcT(l/r)-measurable random 
variables Wi and W2 distributed as W and independent of M. such that 

(2.12) (3{M,a{W)) =F{W ^W^) and t{M,W) = E{\\W - W2\\b) ■ 

The first equality in H2.12|l is due to Berbee (|1979|l . and the second one has been established 
in Dedecker and Prieur (|2r)r)5p . Section 7.1. 

As consequences of the coupling properties (|2.12p . we have the following covariance 
inequalities. Let || • ||oo,p be the L°°(r2, P)-norm. For two measurable functions f,h from 
M to C, we have 

(2.13) |Cov(/(y),/i(x))| <2||/(y)|u,p||M^)lloo,p/3(^(x),cT(y)). 

Moreover, if Lip(/i) is the Lipschitz coefficient of h, 

(2.14) |Cov {f{Y),hiX))\ < ||/(y)||oo,FLip(/i)T(a(y),X). 
Thus, using that t — > e*^* is |x|-Lipschitz, we obtain the bounds 

(2.15) |Cov(e^^'^Se*^^*)| < 2/3i(A: - 1) and |Cov(e^^'^\ e"^'")! < |x|ri(A: - 1). 

2.3. Application to ARCH models. For the models Hl.l|l and (|1.2jl . the /3-mixing co- 
efficients of the process 

(2.16) {Wt)t<,z = {{Zt,Xt))tez 

are smaller than that of {{Yt, (Jt))t&z (because of the inclusion of cr-algebras). If we assume 
that in all cases the rjts are centered with unit variance and admit a density with respect 
to the Lebesgue measure, then 

• The process ((y^, (Tt))tgz defined by Model (|2.3jl is geometrically /3-mixing as soon 
as < 6 < 1. 

• The process {{Yt,at))t&z defined by Model (|2.4|1 is geometrically /3-mixing, as soon 
as Ef=i O'i + Ei=i < 1 (see Carrasco and Chen (HEOl EOHI))- 

• The process ((y^, crt))tgz defined by Model (|2.6p is geometrically /3-mixing as soon 
as: the density of ryo is positive on an open set containing 0; c and h are polynomial 
functions; there exists an integer s > 1 such that |c(0)| < 1, E(|c(r/o)|*) < 1, and 
E(|/i(?7o)|*) < 00. See Proposition 5 in Carrasco and Chen (|2()()2|1 . 

• The process {{Yt,(Jt))t&'L defined by Model (|2.7jl is geometrically /3-mixing as soon 
as < 6 V c < 1. 

• The process {{Yt-, o't))t& defined by Model H2.8|l is geometrically /3-mixing as soon as 
the density of 770 is positive on a neighborhood of and limsup|^|^+oo \ f{x)/x\ < 1 
(see Doukhan (jl994|l . Proposition 6 page 107). 
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Note that some other extensions to nonlinear models having stationarity and dependency 
properties can be found in Lee and Shin I|2flfl5p . 

Concerning the r-dependence, here is a general method to handle the models H2.H1 
and (|2.2|1 . The following Proposition will be proved in appendix (see Ango Nze and 
Doukhan H2nn4|l and Doukhan et al. f20Qfi| for related results). 

Proposition 2.1. Let Yt and at satisfy either \2. 1\) or h2.^) . For Model h2. let {r]'t)tel 
he an independent copy of {rjt)tez, and for t > 0, let = /(r/t_i, . . . , r/i, r^Q, r/'_]^, . . .). 
For Model let ctq he a copy of independent of {ao,r]t)t£z, and for t > let 

al = f{al_^,rit-i). Let 6n he a non increasing sequence such that 

(2.17) 2E{\al-{a:f\)<5n. 
Then 

(1) The process {{Y^,at))t>o is T-dependent with Too(n) < (5„. 

(2) Assume thatY^, cjg have densities satisfying m.ax{ff^2{x), fy^ix)) < C| ln(x)|"x~'' 
in a neighhorhood ofO, for some a > and < p < 1. The process {{Xt, Zt))t>o 
IS T-dependent with T^{n) = 0(((5„)(i-'')/(2-p) | ij^J^^)|(i+a)/{2-p))_ 

Consider Model / ll^.,5)) . and assume that c = X^j>i < 1. Let then {{Yt,at))tel' the 
unique strictly stationary solution of the form Then \2. 1 1\) holds with 

oo 

~ ~ i=k+l 

Note that if a^ and r/g have bounded densities, then /y2(x)) < C\ ln(x)| in a neighbor- 
hood of 0, so that Proposition 12. ir 2) holds with p = Q and a = 1. 

Under the assumptions of Proposition I2.ir 2). we obtain for Model H2.5p the following 
rates for {{Xt, ^t))t>o: 

• If aj = 0, for j > J, then {[Xt, Zt))t>Q is geometrically r-dependent. 

• If aj = 0{W) for some 6 < 1 then Too{n) = 0{k^) for some k < 1. 

• If aj = olr'') for some b > 1 then roo(n) = 0(n-^(^-'')/(2-p) (ln(n))(''+2)(i+")/2). 

For more general models than (|2.5|l . we refer to Doukhan et al. H2nn6|l . 
For Model l|2.2p . if there exists k < 1 such that 

(2.18) E(|(/(x,r/o))2 - {f{y,Vo)f\) < ^\x^ - , 

then one can take 6n = 4E((Tq)«;". Hence, under the assumptions of Proposition I2.ir 2). 
((Xt, Zt))t>o is geometrically r dependent. An example of Markov chain satisfying H2.18jl 
is the autoregressive model af = h{a^_i) + r{rjt-i) for some K-lipschitz function h. 

3. The estimators 

For two complex-valued functions u and v in L2(M) nLi(M), let u*{x) = J e^*^u{t)dt, 
u * v{x) = / u{y)v{x — y)dy, and {u,v) = J u{x)v{x)dx with z the conjugate of a com- 
plex number z. We also denote by ||u||i = f\u{x)\dx, = f\u(x)\'^dx, and ||n||oo = 

SUPa;eR|u(x)|. 
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3.1. The projection spaces. Let (p{x) = sin(7rx)/(7rx). For m G N and j G set 
^m,j{x) = ^/rn^p{mx — j). The functions {(/9mj}jeZ constitute an orthonormal system in 
L2(]R) (see e.g. Meyer JTMUll, p.22). Let us define 

Sm = span{(^^ ^ , j G Z}, m G N. 

The space 5^, is exactly the subspace of L2(M) of functions having a Fourier transform 
with compact support contained in [— vrm, vrm] . The orthogonal projection oi g on 5m is 
dm = Ylj£z^'m,ji9)'Pm,j whcrc amjid) =< fm.,j,9 >■ To obtain representations having a 
finite number of "coordinates", we introduce 

Sln'^ = span{(/9mj, lil < K} 

with integers fc„ to be specified later. The family Wm,j}\j\<kn orthonormal basis of 

5m ^ and the orthogonal projections of g on 5m ^ is given by gln^ = J2\j\<kn '^rn,j{g)^m,j ■ 

(n) ~ 

Subsequently a space Sm will be referred to as a "model" as well as a "projection space". 

3.2. Construction of the minimum contrast estimators. We subsequently assume 
that 

(3.1) fe belongs to L2(M) and is such that Vx G M, f*{x) ^ 0. 

Note that the square integrability of fe and (jl.3|l require that 7 > 1/2 when 5 = d. Under 
Condition H3.HI and for or t in 5m ^ , we define the contrast function 

ln{t) = -y\\\tf-'inl{ZM, with ut{x) = ^^^*^~''^ 



~ 2^ V /l(^) 

Then, for an arbitrary fixed integer m, an estimator of g belonging to 5m is defined by 

(3.2) 5^) =arg min 7„(t). 

By using Parseval and inverse Fourier formulae we obtain that E [n^(Zj)] = (t, 5), so that 
IE(7n(i)) = 11^ ~ llfflP is minimal when t = g. This shows that 7n(t) suits well for the 

estimation of g. It is easy to see that 

1 " 

9m^ = X] am,j^m,j with Umj = - '^u*^^ .{Zi), and E{am,j) =< g,^m,j >= amjig)- 

\j\<kn * = i 

3.3. Minimum penalized contrast estimator. The minimum penalized estimator of g 
is defined as ^ = g^^ where rhg is chosen in a purely data-driven way. The main point 
of the estimation procedure lies in the choice of m = m (or equivalently in the choice of 
model 5^^) involved in the estimators gm^ given by (|3.2p . in order to mimic the oracle 
parameter 

(3.3) rhg = aigminE \\ gl^^ - g \\l . 
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The model selection is performed in an automatic way, using the following penalized criteria 



(3.4) ^ = with m = arg min -fn{9m^) + pen{m) 

m,£{l,--- ,r?i„} L 

where pen(m) is a penalty function that depends on /*(•) through A(m) defined by 
(3-5) A(m) = — / dx. 

The key point in the dependent context is to find a penalty function not depending on the 
dependency coefficients such that 

¥.\\~g-gf<C inf E || - 5 f . 

mejl,--- ,rnn] 

In that way, the estimator g is adaptive since it achieves the best rate among the estimators 
grn , without any prior knowledge on the smoothness on g. 

4. Density estimation bounds 

>From now on, the dependence coefficients are defined as in (|2.9|1 . (|2.1()p and 112.11)1 
with {Wt)t& = {iZt,Xt))tez- 

4.1. Rates of convergence of the minimum contrast estimators gin • Subsequently, 
the density g is assumed to satisfy the following assumption: 

(4.1) g G L2(M),and there exists M2 > 0, j x^g^{x)dx < M2 < 00. 

Assumption ()4.1|1 . which is due to the construction of the estimator, already appears in 
density deconvolution in the independent framework in Comte et al. ()2()()5| 0()()6p . It is 
important to note that Assumption 1)4. ip is very unrestrictive. In particular, all densities 
having tails of order as x tends to infinity satisfy ()4.ip only if s > 1/2. One can 

cite for instance the Cauchy distribution or all stable distributions with exponent r > 1/2 
(see Devroye ()1986p ). The Levy distribution, with exponent r = 1/2 does not satisfies 

(mu. 

Note that glj is fulfilled if g is bounded by Mq and E(Xf ) < Mi < +00, with M2 = 
Mo Ml. 

(n) 

The order of the MISE of gm is given in the following proposition. 
Proposition 4.1. // kS. 1\) and hold, then gm^ defined by kS. ^) satisfies 
Ell. - <h-,,X^ ^^^^ ^ 2AM ^ 2ft„ 



kn n n 

where 

Gov {e"^' , 



(4.2) i?^ = 1 V 



TT 
fc=2 



fii-x) 



dx. 
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Moreover, Rm < iiiiii(i?m,/3) -Rm.r), where 

n—l n— 1 

Rm,i3 = 4Ai/2(m) and Rm,T = 7rmAi/2(m) ri(fc) , 

fc=i fe=i 

wii/i ri defined by and where 

This proposition requires several comments. 

As usual, the order of the risk is given by a bias term || gm — 9 |P +m?{M2 + and 
a variance term 2A(m)/ra + 2Rm/n. As in density deconvolution for i.i.d. variables, the 
variance term 2A{m)/n + 2Rm/n depends on the rate of decay of the Fourier transform of 
fir. It is the sum of the variance term appearing in density deconvolution for i.i.d. variables 
2A(m)/n and of an additional term 2Rm/n. This last term Rm involves the dependency 
coefficients and the quantity Ax/2("^), which is specific to the ARCH problem. The point 
is that, as in the i.i.d. case, the main order term in the variance part is A(m)/n, which 
does not involve the dependency coefficients. In other words, the dependency coefficients 
only appear in front of the additional and negligible term Ai/2im)/n, specific to ARCH 
models. 

The bias term is the sum of the usual bias term || gm—g |P, depending on the smoothness 
properties of g, and on an additional term m?{M2 + With a suitable choice of kn, 

not depending on g, this last term is negligible with respect to the variance term. 

Concerning the main variance term, A(m) given by H,S.5|1 has the same order as 

r(m) = (1 + (7rm)2)T(7rm)^-^ exp |2//(7rm)^} , 
up to some constant bounded by 

(4.4) Ai(/e, Ko) = ^2^^(^^ ^^^"^^ ^) = 1I{<5=0} + 2m^1[{5>o}. 

The rates resulting from Proposition 14. II under and H1.4p are given in the following 
proposition. 

Corollary 4.1. Assume that il.,'^} . iS. and J^.i| ) hold, that g belongs to Ss,r,b{Ci) 
defined by il.4\ l, and that kn > n. Assume either that 

(1) Ek>iMk)<+^ 

(2) or 6 = 0, > 1 in JT^) and X]fc>i '^i(^) < 

(3) or 6 > in and J2k>i ''"i(^) < +oo. 

Then gm^ defined by (.V. satisfies 

(4.5) E||5-4")f < ^im\' + 1)- exp{-2b^-m-} + ^^^^^^ + ^r(m)o^(l), 

where Ci and C2 are finite constants. The constant C2 depends on X]fc>i (respectively 
on Efc>in(/c)y'. 
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If 7 = 1 when (5 = 0, then the bound becomes 



(4.6) E||5-5^)f < g(m^^" + l)-^exp{-267r"m'^} + 

with C2 depending on X]fc>i/^i(^) (respectively on X]fc>i ''"i(^))- 

The rate of convergence of g)^ is the same as the rate for density deconvolution for 
i.i.d. sequences. Our context here encompasses the particular case considered by van Es 

et ai. mm- 

Table 1 below gives a summary of these rates obtained when minimizing the right hand 
of (|4.5|1 . The rhg denotes the corresponding minimizer fsee I3.3jl . 

Table 1 . Choice of rhg and corresponding rates under Assumptions (|1.3|l 
and (|LHi . 



Ci 



(2 + C2)Ai(/„ACo)r(m) 



fe 



(5 = 
ordinary smooth 



(5 > 
supersmooth 



r = 
Sobolev(s) 



TTThg = 0(nl/(2.+27+l)) 

rate = 0(n-2-/(2-+27+i)^ 



Trmg = [ln(n)/(2Ai + 1)]!/* 
rate = 0{{\n(n))-'^''^) 



r > 



= [ln(n)/26]i/'- 

/ln(n)(27+i)A 



rhg solution of 



rate = O 



^2s+27+l-r exp{2^(7rTOg)'^ + 267r''TO^} 

= 0{n) 



When r > 0, (5 > the value of rhg is not explicitly given. It is obtained as the solution 
of the equation 



m 



2s+27+i-r exp{2/i(7rm3)'5 + 2h^''nf ] = 0{n). 



Consequently, the rate of is not easy to give explicitly and depends on the ratio r/5. 
If r/(5 or (5/r belongs to ]k / {k + 1); {k + 1) / {k + 2)] with k integer, the rate of convergence 
can be expressed as a function of k. We refer to Comte et al. H2nn6p for further discussions 
about those rates. We refer to Lacour (|2nn6p for explicit formulae for the rates in the 
special case r > and 5 > 0. 

4.2. Adaptive bound. Theorem l4.1l below gives a general bound which holds under weak 
dependency conditions, for e being either ordinary or super smooth. 
For a > 1, let pen(m) be defined by 

Mm) 

if < (5 < 1/3, 



(4.7) 



pen(m) 



192a- 



n 



64aA3 ^^ if 5 > 1/3, 



n 



where A(m) is defined by H3.5|l . The constant Ai(/e, kq) is defined in H4.4|l and 



(4.8) A3 = 1 + 



32^7r'^ 

Al(/£,Ko) 



(V2 + 8)11 All 

V ^lUs, Ko)I[o<5<i + 2Ai(/e, K;o)ll5 



>1 
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The important point here is that A3 is known. Hence the penalty is exphcit up to a nu- 
merical multiplicative constant. This procedure has already been practically studied for 
independent sequences {Xt)t>i and {£t)t>i in Comte et al. 1120051 I2006p . In particular, 
the practical implementation of the penalty functions, and the calibration of the constants 
have been studied in the two previously mentioned papers. Moreover, it is shown therein 
that the estimation procedure is robust to various types of dependence, whether the errors 
Ej's are ordinary or super smooth (see Tables 4 and 5 in Comte et al. (|2005p l. 



In order to bound up pen(m), we impose that 

ni/(27+i) if 5 = 

1/5 



ln(n) _^ 27 + 1 - (5 f\ii{n) 



if6>0. 



(4.9) 7rm„ < 

' ' " 2/x ' 25^ V 2/i 

Subsequently we set 

(4.10) Ca = max(K^, 2Ka) where Ka = {a + I) / {a — 1). 

Theorem 4.1. Assume that fs satisfies il.'i]) and \H. 1\ that g satisfies J^. j| ), and that rUn 

satisfies i4.9j) . Let pen(m) be defined by \4. 7| ). Consider the collection of estimators gm^ 
defined by with kn > n and 1 < m < nin- Let (3oo and Too be defined as in h2.1lf^) and 
i2.11\) respectively. Assume either that 

(1) ()^{k) = 0(A:-(i+^)) for some e>3 

(2) or 6 = 0, 7 > 3/2 in H."-^) and Too (k) = 0{k-^^+^^) for some 9>3 + 2/(1 + 27) 

(3) or6>0 m fT^ and Too{k) = 0{k-^'^+^^) for some 9 > 3. 

Then the estimator g = g~J^ defined by \3.Jj\^ satisfies 



gm\\ +pen(m) + - 



C 

+ -, 
n 



(4.11) m9-~g\?)<Ca inf ^ „_ _ „ _ , , 

where Ca is defined in 14-. lU^ and C is a constant depending on fs, a, and ^^^i Poo{k) 
(respectively on X]fc>i ''"oo(^)/- 

Remark 4.1. In case (2), when 5 = in H1.3|l . the condition on 9 is weaker as 7 increases 
and /e gets smoother. 

The estimator g is adaptive in the sense that it is purely data-driven. This is due to the 
fact that pen(.) is explicitly known. In particular, its construction does not require any 
prior smoothness knowledge on the unknown density g and does not use the dependency 
coefficients. This point is important since all quantities involving dependency coefficients 
are usually not tractable in practice. 

The main result in Theorem 14. ll shows that the MISE of g automatically achieves the best 
squared-bias variance compromise (possibly up to some logarithmic factor). Consequently, 

(n) 

it achieves the best rate among the rates of the gm , even from a non-asymptotical point 
of view. This last point is of most importance since the m selected in practice are small 
and far away from asymptotic. For practical illustration of this point in the case of density 
deconvolution of i.i.d. variables, we refer to Comte et al. (|2005ll207l6)l . Another important 
point is that, if we consider the asymptotic trade-off, then the rates given in Tableware 
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automatically reached in most cases by the adaptive estimator g. Only in the case (5 > 1/3 
and r > 0, a loss may occur in the rate of g. This comes from the additional power of 
m in the penalty for 5 > 1/3 with respect to the variance order A(m). Nevertheless, the 
resulting loss in the rate has an order which is negligible compared to the main order rate. 

As a conclusion, the estimator g has the rate of the i.i.d. case, with an explicit penalty 
function not depending on the dependency coefficients. 



5. Proofs 

5.1. Proof of Proposition 14. ll The proof of Proposition 14.11 follows the same lines as 
in the independent framework (see Comte et al. (20061)). The main difference lies in the 
control of the variance term. We keep the same notations as in Section [3^ According to 
for any given m belonging to {!,••• ,mn}, cjm^ satisfies, -fnigin'') -ln{gm^) < 0. For 
a random variable T with density /t, and any function ^p such that tp{T) is integrable, set 
'^n,T{ip) = Y^7=iH'iTi) - {iPJt)]- In particular, 

1 " 

(5.1) u^,z{u;) = -Y,[K{Z,)-{t,g)]. 

i=l 

Since 

(5.2) 7n(t) - 7„(s) = ||t - gf - \\s - gf - 2z/„,z(n*_ J, 
we infer that 

(5.3) <lb-5^^f + 2z.„,z(nV) . 
Writing that am,j — CLmj = ^'n,z(ii^„ ^ ), we obtain that 



l^n,z(u*(n)_^in)) = Yj («m,i -amj>n,z(<„,^.) = Yl K2(<„,,)]^ 

lil<'Cn |i|<A:n 



9m 9m 

Consequently, EH^-^^^lp < lb - IP + 2 X^jg^ ]E[(f„,z(^i^„ ^ ))^]. According to Comte 

et al. mm. 



(5.4) lb - =11 9-9mf M\9m - 9^Z^ f<\\9-9mf +^III^lSMl±A, 

The variance term is studied by using first that for / G Li( 



(5.5) lynAn = J yn,z{^'nf{x)dx. 

Now, we use IjB.Sp and apply Parseval's formula to obtain 



i/;(a:)r- 
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Since Vn,z involves centered and stationary variables, we have 

(5.7) E|z.„,z(e"-)l' = Var|i/„,z(e"-)| = -Var(e"^^) + ^ V Cov(e^^^^ e*^^')- 

l<k^l<n 

It follows from the structure of the model that, for A; < /, £/ is independent of {Xi, Zk), so 
that E(e*^^fe) = f*{x)g*{x) and E(e*^(^'-^'=)) = /;(x)E(e"(^'-^'=)). Thus, for k<l, 

(5.8) Cov(e^^^N e^^^O = /;(x)Cov(e*^^\ e^^^')- 
>From H5.7|l and the stationarity of (Xj)j>i, we obtain that 

1 2 " 

(5.9) nvnA^^n? < - + - V |Cov(e"'^Se"^^)| |/*(x)|. 

k=2 

The first part of Proposition 14. II follows from the stationarity of the Xj's, and from H5.3p . 
CT . (EH) and (EH). 

The proof of Rm < min(i?m,/3; -Rm.r), where Rm,i3 and i?m.,T are defined in Proposition 
14. H comes from the inequalities l|2.15|l in Section [T2l Hence we get the result. □ 

5.2. Proof of Corollarv 14. IL According to Butucea and Tsybakov H2()()5|l . under Hl.,Sp . 
we have 

Ai(/e,Ko)r(m)(l + Om(l)) < A(m) < Xi{fe, «:o)r(m)(l + 0^(1)) as m ^ oo, where 



(5.10) r(m) = (1 + {T:mfy{T:mf-^ exp {2^(7rm)'^}, 

where Ai is defined in H4.4|l . In the same way 

Ai(/£,«;o)r(m)(l + Om(l)) < Ai/2(?n) < Ai(/£, fi;o)r(m)(l + 0^(1)) as m ^ cx), 
where 

r(m) = (l + (^m)2)^/2(^m)^-^exp(/i(7rm)'^) 

Ai(/e,«:o) = [Ko7r(l[{<5=o} +M'51I{5>o})] ^• 



It is easy to see that Ai/2("T') ^ Y^mA(m) and hence Ai/2("i) = r("T')om(l)- Now, as 
soon as 7 > 1 when (5 = 0, mAi/2("i) = r("i)o.m(l). Set mi such that for m > nii we 
have 

(5.11) 0.5Ai(/„4)r(m) < A(m) < 2Ai(/e, Ko)r(m), 
and 

(5.12) 0.5A7(/e, 4)r(m) < Ai/2(m) < 2AT(/„ ACo)r(m). 

If X^fc>i /?i(A;) < +CO, HL.Sp and H4.H1 hold, and if kn > n, then we have the upper bounds: 
for m > mi, Ai = Ai(/e,Ko) and Ai = Ai(/e,Ko), 

„ „9 m2(M2 + l) 2Air(m) T(m) 

E5-5^)p < 9-gmf+ ^ ^ + ^ ^ ^ +8AiV/?ifc^ 

n n n 

fc>i 

^ ,, „2^m2(M2 + l) ^2Air(m) ^C(X:fc>i/3i(A:))r(m) 

< ll5-5m|rH \ \ = Om{l)- 

n n n 
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In the same way, if X]fc>i Ti{k) < +00, if 7 > 1 when (5 = 0, if Ijl-Sp and H4.ip hold, and if 
kn > n, then we have the upper bound: for m > mi, 

f«i„2 „ „2 m2(M2 + l) 2Air(m) , mT(m) 

n n ^-^ n 

k>l 

^ ,,2 , m2(M2 + l) ^ 2Air(m) , C'(Efc>i n(A;))r(m) 

< IIS-fi'mlrH \ \ = 0»n(l)- 

n n n 

Since 7 > 1 when (5 = 0, the residual term n~^m^(M2 + 1) is negligible with respect to the 
variance term. 

Finally, Qm being the orthogonal projection of g on S'm, we get = g*^[-m-K,rmT] ^tnd 
therefore 

h - 9mf = —\\g* - 9*raf = IT \g*\'^{x)dx. 

If g belongs to the class Ss^r,biCi) defined in (|1.4|1 . then 

lb - 9mf < ^{m\^ + 1)-^ exp{-267r'^m^}. 
zvr 

The corollary is proved. □ 

5.3. Proof of Theorem 14. IL By definition, g satisfies that for all m S {1, • • • ,m„}, 

Inig) + pen(rfi) < 7„(5'm) + pen(m). 
Therefore, by using (|5.2p we get 

\\9 - gf < Ibm^ - gf + 2un,z{ul („)) + pen(m) - pen(m), 

9 9m 



where Vn z is defined in (|5.H1 . If t = ti + t2 with ti in Sin and t2 in 5*^/, t* has its 

max(m,m') ' 



support in [— 7rmax(m, m'), 7rmax(m, m')] and t belongs to '^^(^^(^^ • Set Sm,m'(0)l) 



|t G 5'''"''* / n / ||t|| = 1| and write 

L max(m,m') ' ii ii j 

|i^n,z(^i!_ {n))| < lb - S-m^ll sup |l/„.z('Ut)|. 

^ ^"^ t6B„,A(0,l) 

Using that 2u?j < a'^u^ + of^ for any a > 1, leads to 

lb -511^ < Ibm'' -5f + o""^lb-5'm^ll^ + a sup (z^„,z(uj))^ + pen(m) -pen(m). 

ieB„,A{0,l) 

Proof in the /^-mixing case. 

We use the coupling methods recalled in Section 12.21 to build approximating variables for 
the Wi = (Zj,Xj)'s. More precisely, we build variables W* such that if n = 2pnqn + ^n, 
<rn < Qn, and £ = 0, • • • , p„ - 1 

Ee = (VFaV+i'-'^cWikn)' = (^(Wi)gn+i' ^{W2)g„)- 
The variables and are such that 

- and are identically distributed. and are identically distributed. 

- F{Ei ^E*)< (5oo{qn) and P(F^ / i?) < /?oo(9n), 
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- and Aio V ct^Eq, Ei, E£_i, Eq, E^, ■ ■ ■ , E^_-^) are independent, and therefore 
independent of A^(£_i)g^ and the same holds for the blocks F/. 

For the sake of simplicity we assume that r„ = 0. We denote by (Z*,X*) = W* the new 
couple of variables. We start from 

(5.13) \\ g - g f< kIW gl^'^ - g \\^ +aKa sup |i/„.2(uj )p + Ka(pen(m) - pen(m)), 

ieB„,A(0,l) 

where Ka is defined in H4.1()|l . Using the notation H5.ip . we denote by i^* zi^t) the empirical 
contrast computed on the Zf. Then we write 

Wg-gf < Kl\\g - gl^^f + 2aKa sup |f*^^(nj')p + Ka(pen(m) - pen(m)) 

tGB„,A(0,l) 

+2aKa sup Wn.ziUt) - l^n,z{Ut)\'^. 
i6B^,rf>(0,l) 

Set 

(5.14) T^{m,m'):=[ sup K^(t)|2 - |,(m, m')] , . 

t6B„,„,(0,l) 

Hence 

ll^-^ll^ < Kails 11^ + 2a'«ar*(m,m) + (2ap(m,m) +pen(m) - pen(m)) 

+2aKa sup |f„,z(n^) - z^;^ ^(uj)^ 
teB™,A{o,i) 

< Kalb + 2KaPen(m) + 2aKa sup \Vn,z{u*t) - Vn,z{'^*t)? 

teB„,A{o,i) 

(5.15) +2aKar*(m,m) 
where pen(m) is chosen such that 

(5.16) 2ap{m,m') < pen(m) +pen(m'). 
Now write 

k=i-' ■'^^ ' 

= ^/Kz(e-)-<z(e-)]^dx. 

Consequently, 
(5.17) 

E 

Since 

nwn,zien-<,zi^n\'] = nWnAen - <zien^z,^z*f] 

n 



■nrrin i 



-n 

k=l 
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we obtain that 
(5.18) E 



sup |z^n,zK) - z^n,zK)P < 4/3oo A(m„). 



By gathering H5.15|l and (|5.18p we get 

Mg-gf < nlWg - g^^'^f + 2aKaY^ E[T*(m,m')] + 2K„,pen(m) + 2aKa/3oo(gn)A(m„). 

m'=l 

Therefore we infer that, for all m S {1, • • • , m„}, 

' +2aKaiCi + C2)/n, 



+ pen(m) 



(5.19) E\\g-gf<Ca 
provided that 

m„ 

(5.20) Aimn)Pooiqn) <Ci/n and ^ E{T*{m,m')) < C2/n. 

m'=l 

Using H5.1H1 . we conclude that the first part of l|5.2n|l is fulfilled as soon as 

(5.21) m„27+i-5 exp{2/z7rV„^}/?oo(gn) < C[/n. 

In order to ensure that our estimators converge, we only consider models with bounded 
penalty, and therefore (|5.2H1 requires that /3oo(^n) < Cj/n^. For qn = [n'^] and Poo{k) = 
0(n-^-^), we obtain the condition n-<^+^^ = 0{n-^). If (9 > 3, one can find c e]0, 1/2[, 
such that this condition is satisfied. Consequently, (|5.2ip holds. 

To prove the second part of (|5.2()p . we split T*{m,m!) into two terms 



n{m,m!) = (Tl^im,m')+Tl2im,m'))/2, 



where, for k = 1,2, 
(5.22) 



Pn qn 



sup 



We only study r*^(m, m') and conclude for T* 2 ('tt-; ""t-') analogously. The study of T* ^(m, m') 
consists in applying a concentration inequality to i{t) defined by 

-| pn Qn -| pn 

(5.23) <i(t) = ^j;(n*(Z2V+.)-(t,<?)) = -EO*)- 



=1 i=l 



Pn 



The random variable ^^^{u^) is considered as the sum of the p„ independent random 
variables Jt) defined as 



Qn 



(5.24) 



ul/u*) = (l/g„) ^n*(Z2V+,) - {t,g). 
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Let m* = max(m,m'). Let M*{m*), v*{m*) and H*{m*) be some terms such that 
^^P*6S,„,,n'(o,i) II KnA^t) lloo< Mf(m*), supigB^^^,(o,i) Var(i/*^_^(n*)) < v*{m) and lastly 
E(sup^g^ ,(0,1) l^n,i('"t)l) — H*(rn*). According to Lemma we take 



i^H^^m*)f = Mt(m*) = and v\m*) = ^V^rn^Jz) ^ 

n 27rg„ 

where 

From the definition of ^^(m, m'), by taking pi{m,m') = 2(1 + 2,^^)(f/'*)^(m*), we get 

(5.26) nTliim,m'))<E[ sup |<i(tx:) - 2(1 + 2^2)(^.)2(^*)] 

teB„,„,(o,i) 

According to the condition H5.16|l . we thus take 

pen(m) = 4ap{m,m) = 4:a{2pi{rn,rn) + 2p2{rn,rn)) = 16api{rn,rn) 

(5.27) = 32a(l + 2^^)(2n-^A(m)) = 64a(l + 2^^)n~^A(m). 

where is suitably chosen. Set m2 and ms as defined in Lemma 15. 2( and set mi such 
that for m* > mi, A{m*) satisfies (|5.11|l . Take mo = mi V m2 V ms. We split the sum 
over m' in two parts and write 

(5.28) j;E(T*i(m,m'))= ^ E(r* ^(m, m')) + Yl E(r;i(m, m')). 

m'=l m'\m*<mo m'\m*>mo 

By applying Lemma lOl we get E(T* ^^(m, m')) < K[I{m*) +I/(m*)], where 



VA2(m%/z) r „j^,2A(m*)\ ... *^ A(m*) / 

/(m ) = ^ exp <^ -2iiri^ } , II{m ) = ^ exp <^ -2Ki(,C{0\i 

Pn [ v*{m*)} pi y \ q. 

When m* < mo, with mo finite, we get that, for all m G {1, • • • , m^}, 

^ E«,,(m,m'))<^. 

m'\m*<mo 

We now come to the sum over m' such that m* > mo. It follows from Comte et al. 
that 



(5.29) V [m ) = ^ < 2X2{je,Ko) : 

2TTqn Qn 

with 



(5.30) A5(/„Ko) = KQ^^/27^Xl\\fe*\\ls<l + ls>i 

where Ai = Xi{fe, ^o) is defined in H4.4|l and 
(5.31) 

r2(m) = (1 + (7rm)2)T(7rm)'"''^«i/2-5/2),(i-5)) exp(2//(7rm)'^) = (7rm)-(i/2-5/2)+p(^)^ 
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By combining the left hand-side of (|5.11|1 and H5.29p . we get that, for ra* > uiq^ 

™ I 2A2(j£,Ko) 



and //(m*) < ^-^^exp 



7 gn J 



• Study of X]m'|m*>mo -^^("^*)- According to the choices for f*(m*), (i?*(m*))^ and 
M*(m*), we have 

m'\m*>ino m'Sjl,--- ,r?i„} 

I 7 g„ J 

Since A{mn)/n is bounded, then g„ = [n^] with c in ]0, l/2[ ensures that 

(5.32) V„„e.p(-?M£Mv^\M^jM<C 

Consequently 

(5.33) II*{rn*) < -. 

m'\m*>mo 

. Study of Zm'\m*>moHm*)- Denote by V = 27 + min(l/2-<5/2, 1-5), uj = (1/2-5/2)+, 
and K' = i^iAi(/£, Ko)/(2A2(/£, kq)). For a,b>l, we use that 

max(a,6)'^e^''''''^''''("''')'e-^'^''^''^("''')" < (a''' e'^'''^' + 5V'eWf'')e-(i^'€V2)(a"+b") 

(5.34) < a'/'e2'^''''''e-(^'^'/2)''"e-(^'^'/2)''" + 6^e2'^''''''e-(^'«'/2)''" . 
Consequently, 

E /(m)<E -„ 2A1(A,,»,) "1 

m'|m*>m,o m'=l 



< 



m'=l n J 

Case < 5 < 1/3. In that case, since 6 < (1/2 — 5/2)+, the choice = 1 ensures 
that r2(m) exp{-(ir'^^/2)(m)(^/^~^/^)} is bounded and thus the first term in H5.35jl is 
bounded by C/n. Since 1 < m < m„ with run such that A{mn)/n is bounded, the term 
Em"=ir2(m')exp{-(i^72)(m')(^/^"^/^)}/" is bounded by C'/n, and hence 

E ^("^*) ^ -• 
n 

m'\m*>mo 
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According to (|5.16|l . the result follows by choosing pen(m) = 4ap(m,m) = 192aA(m)/n. 

Case 5 = 1/2,. According to the inequality JlHII), is such that 2nTT\m)^ -{K'(,^ /2)m^ = 
-2n{TTm*y that is 

2 ^ 16^7r^A^(/e,Ko) 

Arguing as for the case < S < 1/3, this choice ensures that Ylm'\m*>mo ■^i''^*) — C'/n. 
The result follows by taking p{m,m') = 2(1 + 2^^)A(m*)/n, and 

pen(m) = 64a(l + 2t, ) = 64a 1 + 



Case (5 > 1/3. In that case 6 > (1/2 - (5/2)+ ■ We choose such that 
2fnr^{m*Y - {K'f/2){m*Y = -2^TT\m* f . 

In other words 

^2 ^ ^2/ *^ ^ 16/i(7r)^A^(/e, Kp) . . min((3^/2-l/2)+ _ 
-?^lAl(/e,K[)) 

Hence X^m'|m*>mo -^(™'*) — C*/?!. The result follows by choosing p{m,m') = 2(1 + 
2^'^{m,m'))A{m)/n, associated to 

2, ,,A(m) 



pen(m) = 64a(l + 2^^(m)) 



n 

= 64a (l + ^?^f![!M:^l^(^^*)mm((3V2-i/2)+,5)\ □ 
Proof in the r-dependent case. 

We use the coupling properties recalled in Section 1221 to build approximating variables for 
the Wi = (Zi,Xj)'s. More precisely, we build variables W* such that if n = 2pnqn + 
< r„ < g„, and £ = 0, • • • , p„ - 1 

= {W2eg^_^_l, ...,W*2i+i)gJ, = (W^(2i;+i)g„+n W^(W2)g„)- 
The variables and Ff are such that 

- E^ and Eg^ are identically distributed, F^* and F^ are identically distributed, 

- J]E(||W'2£g„+i-W'2%^+i||lR2) < gnToo(gn), X]^(ll^(2^+l)?n+i-^(Wl)g„+illK2) < gn'roo(gn) 
i=l i=l 

- E^ and Ado V a{Eo,Ei, Ei^i, Eq,E*, • • • , Ej_-^^) are independent, and therefore inde- 
pendent of M.(e-i)q„ aiid the same holds for the blocks F^ . 

For the sake of simplicity we assume that r„ = 0. We denote by {Zf,Xf) = W* the 
new couple of variables. 
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As for the proof in the /3-mixing framework, we start from H5.15jl with i?* (m, rfi) defined 
by (|5.14|1 and pen(m) chosen such that H5.16|l holds. Next we use (|5.17p and the bound 



|e — e *^'^| < \x\\t — s\. Hence we conclude that 



-2lqn+i 



-iX. 



It follows that 

E 



sup kn.zK) - i'n.zK)!^ 
teB,„,A(0,l) 



(5.36) 

By gathering H5.15jl and (|5.36p we get 



< 



< 



vr 



< g'n|a;|7X,oo(Q'n) 



TX,oo('7n) 



vr 



)m„A( 

m'=l 

Therefore we infer that, for all m G {1, • • • ,mn}, H5.19|l holds provided that 
(5.37) A(m„)m„roo(gn) < Ci/n and ^ E(r*(m, m')) < Ca/ 



rrir, 



In. 



m'=l 



Using I|5.1H1 . we conclude that the first part of l|5.37|l is fulfilled as soon as 
(5.38) m„2^+2-^ exp{2;U7rV„^}roo((Z„) < C[/n. 



In order to ensure that our estimators converge, we only consider models with bounded 
penalty, that is A(m„) = 0{n). Therefore (|5.38p requires that ninTooiQn) < C'l/n^. For 
Qn = [n'^] and Too{k) = 0{n~^~^), we obtain the condition 



(5.39) 



ninn 



-c(i+e) 



0{n 



If fe satisfies H1.3|l with 5 > 0, and if 6* > 3, one can find c g]0, 1/2[, such that (j5.39p is 
satisfied. Now, if 5 = and 7 > 3/2 in and if 6* > 3 + 2/(1 + 27), then one can find 
c e]0, 1/2[, such that (j5.39p is satisfied. These conditions ensure that H5.2ip holds. 

In order to prove the second part of H5.37p . we proceed as for the proof of the second 
part of (|5.2()p and split T*(m,m') into two terms 

r„*(m,m') = {Tl,{m,m')+Tl^{m,m'))/2, 

where the T*^(m, m')'s are defined in (|5.22p . We only study T*^^{m,m') and conclude for 
T* 2(171, m') analogously. As in the /3-mixing framework, the study of T* ^{m,m') consists 
in applying a concentration inequality to f^i(t) defined in (|5.23p and considered as the 
sum of the Pn independent random variables i^q^ git) defined as in (|5.24p . Once again, 
set m* = max(m,m'), and denote by Mi{m*), v*{m*) and H*{m*) the terms such that 
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^^P*6B„,^,(o,i) II lloo< M*{m*), supig5^^^,(o,i) Var(i/*^_^(n*)) < v*{m) and lastly 

E(sup^g^ ,(0,1) l^n,i('"f)l) — H*{m*). According to Lemma ^FT^ we take 



[H (m )) = , Ml [m ) = y A(m*) and v [m ) = — 



n 



where A2 (m, fz) is defined in H5.25|l and where 
(5.40) Cy* = 2 



]l5>o + Vri(A;)Il5=o 

^3 k>i 



From the definition of ^^(m, m'), by taking pi{m,m') = 2{1 + 2^^ ){H*) (m*), we get 



(5.41) 



E(T*i(m,m'))<E sup \utM)-2{l + 2e){H'y(m 



As in the /^-mixing framework we take pen(m) = 64aA(m)(l + 2^2)/n where is suitably 
chosen (see (|5.4ip ). Set 1712 and as defined in Lemma lOl and set nii such that for 
m* > mi (|5.1ip holds. Take mo = mi V m2 V rn-s and K' = KiXi{fs, k'q)/{Cv* X^{fs, kq)). 
The end of the proof is the same as in /3-mixing framework, up to possible multiplicative 
constants. □ 



5.4. Technical lemmas. 

Lemma 5.1. 

(5.42) 



E 



< A(m) 



The proof of Lemma I^TTI can be found in Comte et al. 



Lemma 5.2. Assume that J2k>i f^^(^) ^ +00. Then we have 
(5.43) sup II i^l/u;) ||oo< ^A(m*) 

Moreover, there exist 772-2 and m^ such that 



E[ sup \u* i{u;)\] < v^2A(m*)/n for m* > ma, 
teB„,,„,(o,i) 



and 



sup Yarii^l/u;)) < 2VA2(m*, /z)/(27rg„) /or m* > mg, 



where A(m) and A2{m, fz) are defined by hH. ,5)) and it.5.i?.5)) . 

Proof of Lemma 15.21 Arguing as in Lemma lETD and by using Cauchy-Schwartz Inequal- 
ity and Parseval formula, we obtain that the first term sup^g^ ,(0.1) II ei'^^t) lloo is 
bounded by 



sup 



t&B„ 



'(0,1) 



E 

j6Z 



dx = \l A(m*). 
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E 



sup 

teB™,„/(o,i) 



E sup 



Pn qn 



PnQr 



=1 i=i 



V iGZ 



By using (|5.6p we obtain 

V iez 



1 ^" 



jez £=1 



V iez 

Now, according to IjB.flp and H2.13|l 

1 n " ^ 

E|z.,„,i(e«-)P<- + -j;/3i(fc)|/,*(: 

This implies that 



2ttp„ 



fc=l 



n-1 



E^ 



sup 

<eB^,,„/(o,i) 



<iK)l < -J-(;^A(m*) + A^/3i(fc)Ai/2(m*; 

/c=l 



Since 2 X^;j>]^ /3i(/c)Ai/2(^) < A(m) for m large enough, we get that, for m* large enough. 



E^ 



sup 

teB„,„/(o,i) 



<i(nn <2A(m*)/n. 



Now, for t G -B^^^/(0, 1) we write 



^Hn . -. ^ ^yn . -I 

fc=l l<fc</<q„ 

According to (|5.5|1 . (|5.8p and l|2.1Hp we have 



|CovK*(Zfc),n:(Zz))| 



dxdy 



TKmr ■J —Trm' 

/■Km* /•nm* 
-■Kin* J —■K'm* 
■Km* /•■Km' 



< 



n{x)f*e{-y) 

/;(-j/)Cov(e"^Se^^^Or(x)t*(j/) 
2(ir{k)\e{x)e{y)\ 



dxdy 



■Km* J — Km* 



1/1(^)1 



-dxdy. 
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Hence, 



Var 



1 In 



i=l 



nm pTTm 



■Km* J — Trm* 

+2^/3i(fc) 

k=l 



f*(u-v)t*{u)t*{-v) 

fe{u)fe{-v) 

t*{u)t*{v 



dudv 



■Km' J — Km 



dudv 



Following Comte et al. (|2()()6p and applying Parseval's formula, the first integral is less that 
^J IS.2{Trn* , /z)/2'/r. For the second one, write 



Km fKm 



Km* J —Km* 



t*{u)t*{v) 



that is 



Km fKm 



dudv < V2wm*\\t*\\\l / \t*{v)\^dv 
t*{u)t*{v) 



dv 



.* mv)\' 



dudv < (27r)V"i*^("i*)- 



-Km* J —Km* 

Using that 7 > 1/2 if (5 = 0, we get that y^m*A(m* 
result follows for m large enough. □ 



Om(\/A2(m*, fz)) and hence the 



Lemma 5.3. Assume that X]fc>i''"i(^) < +00. Assume either that 

(1) 5 = 0, 7 > 3/2 m 

(2) or6>0 m fO) . 

T/ien we /lawe 

i^l.K*) ||oo< \/A(m*) 



(5.44) 



sup „ 



Moreover, there exist 772-2 and such that 



and 



E[ sup \vl i{ul)\] < A/2A(m*)/n for m* > m2, 

tG-B„,,„,{0,l) 

sup Var(i/*^ f^{ui)) < C^* \/A2(m*, /z)/(27rg„) /or m* > ms, 
*eB„™/(o,i) 



where A(m) and A2{m,fz) are defined by iS. ,5)) and 1^,5)) and where Cy* is defined in 

Proof of Lemma 15.31 The proof of (|5.44p is the same as the proof of (|5.4,Sp . Next, again 
as for the proof of Lemma 



E 



sup 



with 



ix.\\2 



^-^Pn J-Km* 



-dx. 
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Now, according to IjB.Qp and H2.14|l 



n-l 



qn qn 



This implies that 



sup 

*e-B„.m'(0:i) 



i^n ^Hn 'in , , 

k=l 



m 



Since 27r^f^y^Ti{k)mAi/2irn) < A{m) for m large enough, we get that for m* large 
enough 



sup 

*GB„.W(0,1) 



<iK) <2A(m*)/n. 



Now, for t G i?m,,m'(Oi 1) we write 



Var(-^n:(Z2V+,) 



i=l 



1 ^" 



fc=i 



l<fc</<g„ 



According to (|5.5p . H5.8p and (|2.14p and by applying the same arguments as for the proof 
of Lemma 15.21 we have 



\CoY{uUZk),uUZi))\ 



< 



Trm' J —Trm 



Hence, 



qn 



Var -^n:(Z2V+J < 



Trm /-Trm 



1/1(^)1 

* f*(u-v)t*iu)t*{-v) 



in ^ J —Trm* J —TTin* 
Qn 



fe{u)fe{-v) 

ut*{u)t*(v 



dudv 



1" /•wm i-irm 

k=l J~TTm* J- 



Trm' J — Trm' 



dudv I . 



Once again the first integral is less that ^y A2{m*, fz)/2-K. For the second one, write 



Trm /"Trm 



-Trm* J —Trm," 

that is 



ut*{u)t*{v) 



n{v) 



dudv < (m*)3/2||f |L / / \t*{v)\'^dv 



^/3 



dv 



■Km /•Trm' 



Trm* J —-Km* 



t*{u)t*{v) 



dudv < {2Txfl'^^J{m*YA{m*). 
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If <5 > 0, then y/{m*)^A{m*) = Om\J l^^ijn* , j z) ■ If 7 > 3/2 and 5 = 0, we get 
that sj (m*)3A(m*) = Om\l ^2(171*, fz). Lastly, if 7 = 3/2 and 5 = 0, we get that 
^ (^m*)^A{m*) < A2(m*, fz) and the result follows for m large enough. □ 

Lemma 5.4. Let Yi,...,l^ be independent random variables and let !F be a countable 
class of uniformly bounded measurable functions. Then for > 



E 



svvpK.Yifr -2{i + 2ew 



4 /v T<r f2«H^ 98M? iK^cm uh 



Ki\n Km^C^iC^) 



with C(C) = vTTC^ -1, Ki = 1/6, and 



sup ll/lloo < Ml, E 



1 1 " 

sup|i/„.y(/)| <H, sup- VVar(/(n)) <7;. 



This inequality comes from a concentration Inequality in Klein and Rio H2()()5|l and 
arguments that can be found in Birge and Massart (1998). Usual density arguments show 
that this result can be applied to the class of functions = Bm,m'iO, !)• 

Proof of Proposition To prove (1), let for t > 0, Y^* = r]tcr1. Note that the sequence 
{{Y*,al))t>i is distributed as {{Yt, at))t>i and independent of Mi = a{aj,Yj,0 < j < i). 
Hence, by the coupling properties of r (see l|2.12|l ). we have that, for n + i < ii < • • • < i;, 

1 ' 

r(M, (y,^4), . . . , {YlaD) <jJ2 - {{Y*f, K.))'|Ir^ < , 

i=i 

and (1) follows. 

To prove (2), define the function fe{x) = ln(x)IIa;>e + 21n(e)IIa;<e and the function 
ge{x) = ln{x) — fe{x). Clearly, for any e > and any n + i < «i < . . . < z/, we have 

(5.45) t(M, (Z,,,X,J, . . . , (Z,,,XO) < 2E(|5,(yo')| + \9e{(Tl)\) 

+ t(M, (/,(y,2), /^(4)), . . . , (/^(y^2)^ ^^(^2))) 

For < e < 1, the function is 1/e-Lipschitz. Hence, applying (1), 

r(M, (/,(y,2), . . . , (/^(y^2)^ ^^(^2 ))) < ^ _ 

Since max(/g.2(x), /y2(x)) < C\ ln(2;)|°x~'' in a neighborhood of 0, we infer that for small 
enough e, 

E(|5.(yo')l + \9.{<yl)\) < K,e'-^\ ln(6)|^+" , 

for Ki a positive constant. Prom (|5.45p . we infer that there exists a positive constant K2 
such that, for small enough e, 

T{Mi, iZ,„Xi,), . . . , (Z,,,XO) < i^2(^ + €'-P\ ln(e)|i+"' 
The result follows by taking e = (5„)1/(2-p) | in(5„)|-(i+")/(2-p) . 



26 



F. COMTE*'\ J. DEDECKER2, AND M. L. TAUPIN ^ 



Now, we go back to the model (|2.5|1 . If Xlj^i '^j < ^^e unique stationary solution to 
H2.5|l is given by Giraitis et al. (2000): 

oo oo 

i=iji,...,ji=i 

for any 1 < A; < n, let 

[n/k] k 

a^{k, n) = a + a ^ ^ aj^ . . . aj^ritj^ . . . Vt-{n+-+ny 

Clearly 

E{\al-{a*J^\)<2E{\al-al{k,n)\). 

Now 

oo 

E{\al-al{k,n)\)<[ ^ + E E «i) • 

«=[n/A:]+l 1=1 j>k 

This being true for any 1 < k < n, the proof of Proposition 12. II is complete. 
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